AI is theft - unless I want to use it

I am making this website for myself to learn about web design and development. I use various AI tools to write HTML and CSS, because I love seeing quick results and learn well by looking at and modifying code. I do not use AI to write content on this page unless specified otherwise. I do not use AI to generate images or other media for this page. I will experiment with generated ASCII-art, because I feel that is ethical-ish.

When I first wrote this disclaimer, it got me to reflect more deeply on where I draw the line and why. I am not sure I have a definitive conclusion.

I would describe myself as a skeptic of AI and a hater of its evangelists and most implemenations/’products’. I do not like the way that a lot of it tastes of the same get-rich-quick, break-things-move-fast, quasi-religious scam soup that comprises most of the modern technological scene.

I am not an artist and have not yet found AI generated art that I enjoy. Notable exceptions are the Reggea Wars videos and whatever EmergentGarden gets up to, but exceptions prove the rule in this instance. I agree with the commonly used arguments that conclude that generative AI images and and media are unethical. The core consideration being that the models are trained on artists’ material without consent or remuneration.

Visual and audio art are often complex mixtures of analogous and digital media. Even when they consist of one simple, atomic medium, think clay sculptures or pigment on a surface, they are multidimensional and scrumptiously human. Digital representations of such art, boiled into pixels and binary sound code, that then is gobbled up into models, anonymize the humans and machines that created the originals. The output of such models is a remix of its inputs and the authors are untraceable. Even a ghiblified avatar contains traces of thousands of human artists, not only those at Studio Ghibli. All those humans are invisible in the final result. Their art, however notable or forgotten, however large or small, loved or hated, now just a few pixels in a collage of nobodies’ work.

Conveniently, I feel somewhat differently about generated text and, by extension, code. Text is inherently digital and atomic. Computers speak binary, letters are trivial to represent in binary. Text is democratically unhuman. It is as easy for us to parse, as for the computer and likely any intelligent life that might or might not be out there. A blackmail letter cut from newspapers and magazines does not deprive the original authors of the fluff piece on wedding dresses and adverts for healing crystals of their ownership of the letter ‘e’ or the ampersand symbol. Since Gutenberg invented the printing press, written word has been infinitely copyable. Since the first symbolic representations were used to keep records of whatever was worth recording, it was likely understood that those symbols were not proprietary. Making them proprietary would render them useless as tools of communication (apart from the impracticality of licensing a hieroglyph).

But enough with the philosophical nonsense on the different nature of pixels and letters. You might object ‘Generating text still steals the intellectual property of whoever wrote stuff, right? Surely, the thoughts expressed are the point, not the letters!’ and I would agree with you (person I made up to agree with). But because of the copyability and because text is intuitively and openly composed of atomic components, we have always treated written content differently from other media. We accept it as a fact that text is easy to copy and have conventions to refer to sources properly when we do so. It depends on what we are copying from and where we are copying to. In most contexts outside academia, it is viewed as a minor offense to misattribute a quote to Einstein or copy some text from somewhere. Here I go, sue me:

I was a Flower of the mountain yes when I put the rose in my hair like the Andalusian girls used or shall I wear a red yes and how he kissed me under the Moorish wall and I thought well as well him as another and then I asked him with my eyes to ask again yes and then he asked me would I yes to say yes my mountain flower and first I put my arms around him yes and drew him down to me so he could feel my breasts all perfume yes and his heart was going like mad and yes I said yes I will Yes.

If text is distinct enough, it can easily be traced back to its source. You can copy the quote above and search the internet with it. It will be clear where it came from. You can do the same with the rest of this rant and should not find any other sources other than itself. There are plagiarism programs that automate this process. You might still say that the ghiblified image from the example above is equivalent to asking an LLM to write a parking ticket in the style of Sylvia Plath. Not only offensive to Sylvia, but also the millions of redditors and academics who’s posts and books were cooked into the model; who become nobodies through the model. To me that does not feel true to the same extent. I can’t put my finger on it more clearly than that.

For code in particular, of course, copying bits together from somewhere without referring to the source or ever remembering which bits were copied and which were yours is normal practice to the point that I don’t feel the need to justify using LLMs to code. They just let you copy snippets you don’t understand in hyerspeed. Of course, giving credit is good practice in coding and many forms of copying without attribution or permission are rude or against licensing. But no one who has written code after 1990 has done so without copying unattributed bits here and there.

Anyway, I wanted to write a short disclaimer and ended up writing my first blog post without even talking about the emissions that AI services generate. I plan to outline the AI tools I use and how I use them in the future.